# Disentangled Attention Mechanism

Deberta Xlarge Mnli
MIT
DeBERTa-XLarge-MNLI is an enhanced BERT model based on the disentangled attention mechanism, fine-tuned on the MNLI task with 750M parameters, excelling in natural language understanding tasks.
Large Language Model Transformers English
D
microsoft
833.58k
19
Deberta Large
MIT
DeBERTa is an improved BERT model that enhances performance through a disentangled attention mechanism and an enhanced masked decoder, surpassing BERT and RoBERTa in multiple natural language understanding tasks.
Large Language Model Transformers English
D
microsoft
15.07k
16
Deberta V2 Xxlarge Mnli
MIT
DeBERTa V2 XXLarge is an enhanced BERT variant based on the disentangled attention mechanism, surpassing RoBERTa and XLNet in natural language understanding tasks with 1.5 billion parameters
Large Language Model Transformers English
D
microsoft
4,077
8
Deberta Base
MIT
DeBERTa is an improved BERT model based on the disentangled attention mechanism and enhanced masked decoder, excelling in multiple natural language understanding tasks.
Large Language Model English
D
microsoft
298.78k
78
Deberta V2 Xxlarge
MIT
DeBERTa V2 XXLarge is an improved BERT model based on disentangled attention and enhanced mask decoding, with 1.5 billion parameters, surpassing BERT and RoBERTa performance on multiple natural language understanding tasks
Large Language Model Transformers English
D
microsoft
9,179
33
Deberta V2 Xlarge Mnli
MIT
DeBERTa V2 XLarge is an enhanced natural language understanding model developed by Microsoft, which improves the BERT architecture through a disentangled attention mechanism and enhanced masked decoder, outperforming BERT and RoBERTa on multiple NLU tasks.
Large Language Model Transformers English
D
microsoft
51.59k
9
Deberta V2 Xlarge
MIT
DeBERTa V2 XXLarge is an enhanced natural language understanding model developed by Microsoft, which improves the BERT architecture through a disentangled attention mechanism and enhanced masked decoder, achieving SOTA performance on multiple NLP tasks.
Large Language Model Transformers English
D
microsoft
116.71k
22
Deberta Xlarge
MIT
DeBERTa improves upon BERT and RoBERTa models with a disentangled attention mechanism and enhanced masked decoder, demonstrating superior performance in most natural language understanding tasks.
Large Language Model Transformers English
D
microsoft
312
2
Deberta Large Mnli
MIT
DeBERTa-V2-XXLarge is an improved BERT model based on the disentangled attention mechanism and enhanced masked decoder, excelling in multiple natural language understanding tasks.
Large Language Model Transformers English
D
microsoft
1.4M
18
Deberta V3 Small
MIT
DeBERTa-v3 is an improved natural language understanding model developed by Microsoft, optimized through ELECTRA-style pretraining and gradient-disentangled embedding sharing technology to achieve efficient performance while maintaining a relatively small parameter count.
Large Language Model Transformers English
D
microsoft
189.23k
51
Deberta V3 Xsmall
MIT
DeBERTaV3 is an improved version of the DeBERTa model proposed by Microsoft, which enhances efficiency through ELECTRA-style gradient-disentangled embedding sharing pretraining method, demonstrating excellent performance in natural language understanding tasks.
Large Language Model Transformers English
D
microsoft
87.40k
43
Deberta V2 Xlarge
MIT
DeBERTa is an enhanced BERT decoding model based on the disentangled attention mechanism, surpassing the performance of BERT and RoBERTa on multiple natural language understanding tasks through improved attention mechanisms and enhanced masked decoders.
Large Language Model Transformers English
D
kamalkraj
302
0
Deberta Base
MIT
DeBERTa is an enhanced BERT decoding model based on the disentangled attention mechanism, improving upon BERT and RoBERTa models, and excels in natural language understanding tasks.
Large Language Model Transformers English
D
kamalkraj
287
0
Deberta Base Mnli
MIT
Enhanced BERT decoding model based on disentangled attention mechanism, fine-tuned on MNLI task
Large Language Model English
D
microsoft
96.92k
6
Deberta V3 Large Mnli
DeBERTa-v3-large model trained on MultiNLI dataset for textual entailment relationship judgment
Text Classification Transformers English
D
khalidalt
150
5
V3large 2epoch
MIT
DeBERTa is an enhanced BERT improvement model based on the disentangled attention mechanism. With 160GB of training data and 1.5 billion parameters, it surpasses the performance of BERT and RoBERTa in multiple natural language understanding tasks.
Large Language Model Transformers English
V
NDugar
31
0
1epochv3
MIT
DeBERTa is an enhanced BERT model based on the disentangled attention mechanism, surpassing BERT and RoBERTa in multiple natural language understanding tasks
Large Language Model Transformers English
1
NDugar
28
0
Debertav3 Mnli Snli Anli
DeBERTa is an enhanced BERT decoding model based on the disentangled attention mechanism, which improves upon BERT and RoBERTa models and performs better in most natural language understanding tasks.
Large Language Model Transformers English
D
NDugar
26
3
V3large 1epoch
MIT
DeBERTa is an enhanced BERT decoder model based on the disentangled attention mechanism, excelling in natural language understanding tasks.
Large Language Model Transformers English
V
NDugar
32
0
Deberta Large Mnli Zero Cls
MIT
DeBERTa is an enhanced BERT decoding model based on the disentangled attention mechanism, surpassing BERT and RoBERTa in multiple natural language understanding tasks by improving the attention mechanism and masked decoder.
Large Language Model Transformers English
D
Narsil
51.27k
14
V2xl Again Mnli
MIT
DeBERTa is an enhanced BERT decoding model based on the disentangled attention mechanism. By improving the attention mechanism and masked decoder, it surpasses the performance of BERT and RoBERTa in multiple natural language understanding tasks.
Large Language Model Transformers English
V
NDugar
30
0
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase